Skip to content

test(e2e): add GPU workload image artifacts#1484

Open
elezar wants to merge 1 commit into
mainfrom
feat/1476-gpu-workload-images/elezar
Open

test(e2e): add GPU workload image artifacts#1484
elezar wants to merge 1 commit into
mainfrom
feat/1476-gpu-workload-images/elezar

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented May 20, 2026

🏗️ build-from-issue-agent

Summary

Define local GPU workload image artifacts for smoke-pass, smoke-fail, and cuda-basic validation. The build task supports Docker or Podman through the existing container-engine helper, tags images with the source revision, and writes the latest local refs for downstream testing.

Related Issue

Closes #1476

Changes

  • e2e/gpu/: documents the workload image contract, direct validation flow, and publish guidance.
  • e2e/gpu/images/smoke-pass: adds a positive marker-only workload image.
  • e2e/gpu/images/smoke-fail: adds a stable negative-path workload image.
  • e2e/gpu/images/cuda-basic: builds CUDA samples deviceQuery and vectorAdd from NVIDIA/cuda-samples v12.8, copies the statically linked binaries into the OpenShell community base image, and runs both validations.
  • tasks/scripts/e2e-gpu-build-images.sh: adds image discovery, subset selection, source-SHA/dirty tagging, Docker/Podman build invocation, and latest.env generation.
  • tasks/test.toml: adds mise run e2e:gpu:images:build.
  • .gitignore: ignores generated GPU image build metadata.

Deviations from Plan

None. Migration to OpenShell-Community and Git-ref based builds remain follow-up work as planned.

Testing

  • mise run e2e:gpu:images:build
  • docker run --rm localhost/openshell/gpu-workload-smoke-pass:785872b4
  • docker run --rm localhost/openshell/gpu-workload-smoke-fail:785872b4 exits 42 with the failure marker
  • docker run --rm --device nvidia.com/gpu=all localhost/openshell/gpu-workload-cuda-basic:785872b4
  • mise run pre-commit
  • mise run e2e:gpu passes 7 GPU selection tests

Tests added:

  • Unit: N/A - this PR adds image artifacts and task wiring.
  • Integration: N/A.
  • E2E: Adds GPU workload image artifacts intended for e2e validation; direct Docker/CDI validation and the existing Docker GPU e2e lane were run locally.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

Documentation updated:

  • e2e/gpu/README.md: workload contract, build task, local validation, and publish guidance.
  • e2e/gpu/images/*/README.md: per-image purpose, build, and direct-run instructions.

Closes #1476

Define local GPU workload image sources for smoke-pass, smoke-fail, and cuda-basic validation, plus a mise build task that tags images with the source revision and records the latest local image refs.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar
Copy link
Copy Markdown
Member Author

elezar commented May 20, 2026

🏗️ build-from-issue-agent

E2E Test Attestation

Local E2E tests passed. CI does not currently run this GPU host validation, so this comment records the local run.

Field Value
Commit 785872b41279978e5d95c721b6785c776287702d
Command mise run e2e:gpu
Gateway mode Docker
Result PASS

Test Summary

7 passed; 0 failed; 0 ignored; finished in 5.92s

Tests Executed

  • gpu_device_selection::gpu_invalid_device_request_fails - PASSED
  • gpu_device_selection::parse_cdi_gpu_device_ids_ignores_unexpected_nested_devices - PASSED
  • gpu_device_selection::parse_cdi_gpu_device_ids_reads_discovered_devices - PASSED
  • gpu_device_selection::parse_cdi_gpu_device_ids_reads_lowercase_host_discovered_devices - PASSED
  • gpu_device_selection::gpu_request_without_device_matches_plain_all_gpu_container - PASSED
  • gpu_device_selection::gpu_all_device_request_matches_plain_all_gpu_container - PASSED
  • gpu_device_selection::gpu_request_for_each_discovered_device_matches_plain_container - PASSED

Direct Image Validation

  • mise run e2e:gpu:images:build built localhost/openshell/gpu-workload-smoke-pass:785872b4, localhost/openshell/gpu-workload-smoke-fail:785872b4, and localhost/openshell/gpu-workload-cuda-basic:785872b4.
  • docker run --rm localhost/openshell/gpu-workload-smoke-pass:785872b4 passed.
  • docker run --rm localhost/openshell/gpu-workload-smoke-fail:785872b4 exited 42 with the expected failure marker.
  • docker run --rm --device nvidia.com/gpu=all localhost/openshell/gpu-workload-cuda-basic:785872b4 passed on the local NVIDIA L4 host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test(e2e): define GPU validation image artifacts

1 participant